fix: resolve QA validation deadlock when subtasks are stuck or failed by AndyMik90 · Pull Request #1853 · AndyMik90/Auto-Claude

AndyMik90 · 2026-02-16T08:01:23Z

Summary

Fixes QA validation deadlock where stuck/failed subtasks block QA from ever starting
Adds is_build_ready_for_qa() that considers builds ready when all subtasks reach a terminal state (completed, failed, or stuck)
Updates mark_subtask_stuck() to sync implementation_plan.json status to "failed"
Reorders QA loop to honor human feedback (QA_FIX_REQUEST.md) before checking build completeness
Replaces is_build_complete() with is_build_ready_for_qa() in QA validation paths

Problem

When a coding agent marks a subtask as "stuck" (e.g., "Manual testing on multi-monitor setup"):

get_next_subtask() skips stuck subtasks and returns None → coder exits
should_run_qa() calls is_build_complete() which requires ALL subtasks status == "completed" → returns False
QA never starts, creating a permanent deadlock
Even clicking "proceed with QA" was blocked because the build completeness check ran before the human feedback check

Changes

File	Change
`core/progress.py`	Add `is_build_ready_for_qa()` — checks terminal states (completed/failed/stuck)
`services/recovery.py`	`mark_subtask_stuck()` now also sets `status="failed"` in `implementation_plan.json`
`qa/loop.py`	Reorder: check human feedback before build completeness; use `is_build_ready_for_qa()`
`qa/criteria.py`	`should_run_qa()` uses `is_build_ready_for_qa()` instead of `is_build_complete()`
`cli/qa_commands.py`	Updated error messages for new semantics
`progress.py`	Re-export `is_build_ready_for_qa` in module facade

Test plan

17 new unit tests for is_build_ready_for_qa() covering: all completed, mixed terminal states, stuck subtasks via attempt_history, missing files, empty phases, invalid JSON, graceful fallbacks
3 new tests for mark_subtask_stuck() plan update: verifies status change to "failed", missing subtask handling, missing plan file handling
Updated 5 existing test mock targets from is_build_complete to is_build_ready_for_qa
All 2582 tests pass (pre-commit hook verified)
Ruff lint and format clean

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Clarified QA messaging for builds not ready for QA and updated readiness wording
- Improved detection of terminal vs non-terminal subtask states for QA gating
New Features
- QA readiness now requires all subtasks to be in terminal states (completed, failed, or stuck)
- Persisted "stuck" status and reason into build plans so stuck subtasks are recorded
Tests
- Added comprehensive tests covering readiness, stuck handling, and recovery behaviors

gemini-code-assist · 2026-02-16T08:01:45Z

Summary of Changes

Hello @AndyMik90, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a significant issue in the QA validation workflow where the system could enter a deadlock state if any subtasks were marked as stuck or failed. By introducing a more flexible definition of 'build readiness' that accounts for all terminal subtask states, and by ensuring that human feedback takes precedence in the QA loop, the system can now gracefully handle non-completed subtasks and proceed with QA. Additionally, the change improves data consistency by updating the implementation plan when subtasks become stuck.

Highlights

QA Deadlock Resolution: Fixed a critical QA validation deadlock where subtasks marked as 'stuck' or 'failed' would prevent the QA process from ever starting, leading to an infinite loop.
Enhanced Build Readiness Check: Introduced a new is_build_ready_for_qa() function that considers a build ready for QA if all subtasks have reached any terminal state (completed, failed, or stuck), rather than strictly requiring all to be 'completed'.
Subtask Status Synchronization: Modified mark_subtask_stuck() to automatically update the corresponding subtask's status to 'failed' within implementation_plan.json, ensuring consistency across the system.
QA Loop Logic Reordering: Reordered the QA validation loop to prioritize human feedback (via QA_FIX_REQUEST.md) before performing the build completeness check, allowing manual overrides to proceed.
Refactored QA Validation Paths: Replaced all instances of is_build_complete() with the more robust is_build_ready_for_qa() in relevant QA validation logic.

Changelog

apps/backend/cli/qa_commands.py
- Updated error messages to reflect the new QA readiness criteria, informing users that all subtasks must reach a terminal state.
apps/backend/core/progress.py
- Added is_build_ready_for_qa function, which checks if all subtasks are in a terminal state (completed, failed, or stuck).
apps/backend/progress.py
- Exported is_build_ready_for_qa in the module's public interface.
apps/backend/qa/criteria.py
- Modified should_run_qa to use is_build_ready_for_qa instead of is_build_complete.
apps/backend/qa/loop.py
- Reordered the QA loop to check for human feedback (QA_FIX_REQUEST.md) before evaluating build readiness.
- Replaced is_build_complete with is_build_ready_for_qa in the build readiness check.
apps/backend/services/recovery.py
- Enhanced mark_subtask_stuck to also update the subtask's status to 'failed' in implementation_plan.json.
tests/agents/test_agent_flow.py
- Updated test mocks to reference is_build_ready_for_qa instead of is_build_complete.
tests/test_cli_qa_commands.py
- Modified test assertion for the updated 'Build not ready for QA' message.
tests/test_progress_qa_readiness.py
- Added a new file containing 17 unit tests for the is_build_ready_for_qa function, covering various scenarios.
tests/test_qa_criteria.py
- Updated existing tests to patch is_build_ready_for_qa instead of is_build_complete.
tests/test_recovery.py
- Added 3 new tests for mark_subtask_stuck to verify its interaction with implementation_plan.json updates.

Activity

17 new unit tests were added for the is_build_ready_for_qa() function, covering various terminal states, edge cases, and error handling.
3 new tests were added for mark_subtask_stuck() to validate its new functionality of updating implementation_plan.json.
5 existing test mock targets were updated to reflect the transition from is_build_complete to is_build_ready_for_qa.
All 2582 existing tests passed, and code quality checks (Ruff lint and format) were clean.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-16T08:01:48Z

📝 Walkthrough

Walkthrough

The PR introduces is_build_ready_for_qa() to treat subtasks in terminal states (completed, failed, or stuck) as QA-ready, updates QA gating and CLI messaging to use that criterion (skipping it when human feedback exists), persists stuck status into implementation_plan.json, and updates tests accordingly.

Changes

Cohort / File(s)	Summary
Build readiness logic `apps/backend/core/progress.py`, `apps/backend/progress.py`	Adds `_load_stuck_subtask_ids()` and `is_build_ready_for_qa(spec_dir: Path)`; centralizes stuck-subtask loading and re-exports readiness function via `apps/backend/progress.py`.
QA gating & loop `apps/backend/qa/criteria.py`, `apps/backend/qa/loop.py`	Replaces `is_build_complete` with `is_build_ready_for_qa`; loop now skips readiness gate when `QA_FIX_REQUEST.md` (human feedback) exists and removes duplicated checks.
CLI messaging `apps/backend/cli/qa_commands.py`	Updates early-return QA messages to "Build not ready for QA" and clarifies requirement that all subtasks reach a terminal state.
Recovery persistence `apps/backend/services/recovery.py`	When marking a subtask stuck, attempt to atomically update `implementation_plan.json` to set the matching subtask status to `failed` and prepend a "Marked as stuck" note to `actual_output`; tolerant error handling on file/JSON errors.
Tests `tests/agents/test_agent_flow.py`, `tests/test_cli_qa_commands.py`, `tests/test_qa_criteria.py`, `tests/test_progress_qa_readiness.py`, `tests/test_recovery.py`	Updates test patches from `is_build_complete` → `is_build_ready_for_qa`; adds extensive tests for readiness logic (many scenarios, UTF-8/malformed handling) and adds tests for recovery plan updates.

Sequence Diagram(s)

mermaid
sequenceDiagram
rect rgba(200,200,255,0.5)
Participant CLI as CLI/QA Command
Participant Loop as QA Loop
Participant Progress as core.progress
Participant FS as Filesystem (plan/history)
Participant Recovery as RecoveryService
end

CLI->>Loop: trigger QA run
Loop->>FS: check QA_FIX_REQUEST.md (human feedback)
alt human feedback present
    Loop-->>CLI: proceed with QA (skip readiness)
else no human feedback
    Loop->>Progress: is_build_ready_for_qa(spec_dir)
    Progress->>FS: read implementation_plan.json
    Progress->>FS: read attempt_history.json (load stuck IDs)
    alt stuck detected and marking occurs
        Recovery->>FS: write updated implementation_plan.json (mark failed + note)
    end
    Progress-->>Loop: readiness true/false
    Loop-->>CLI: continue or early-return with message
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: human feedback not processed when QA already approved #41: Related changes to QA human-feedback handling and CLI/loop logic around QA_FIX_REQUEST.md.
fix: prevent OOM, orphaned agents, and unbounded growth during overnight builds #1813: Related to stuck-subtask handling in core.progress and recovery persistence adjustments.
auto-claude: 194-bug-rate-limit-during-task-execution-causes-subtas #1726: Related recovery-area work for persisting and resetting stuck subtask state and plan updates.

Suggested labels

size/M, test

Suggested reviewers

AlexMadera

Poem

🐇 I hopped through plans and JSON hay,
Terminal states now light the way.
Stuck gets marked and written down,
QA gates adjusted, no frown—
Tests cheer loud, the build’s OK!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main fix: resolving a QA validation deadlock caused by stuck or failed subtasks.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/qa-validation-stuck-subtasks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request effectively resolves a critical deadlock in the QA validation loop by introducing a new is_build_ready_for_qa check that considers failed and stuck subtasks. The logic is sound, and the changes are applied consistently across the relevant modules. The reordering of the QA loop to prioritize human feedback is also a great improvement. The addition of comprehensive unit tests for the new and modified logic is excellent. I have one suggestion regarding data model consistency in services/recovery.py to prevent potential data loss.

gemini-code-assist · 2026-02-16T08:03:21Z

apps/backend/services/recovery.py

+                            subtask["status"] = "failed"
+                            subtask["notes"] = f"Marked as stuck: {reason}"


The introduction of a new notes field here is inconsistent with the Subtask data model defined in implementation_plan/subtask.py. The Subtask class does not have a notes field, which means this data will be lost if the implementation_plan.json is ever loaded and resaved using the ImplementationPlan model objects.

To maintain data model consistency, I recommend using the existing actual_output field, which seems suitable for this kind of information. The Subtask.fail() method already uses this field for failure reasons.

Please also update the new test test_mark_subtask_stuck_updates_plan in tests/test_recovery.py to assert on the actual_output field instead of notes.

Suggested change

subtask["status"] = "failed"

subtask["notes"] = f"Marked as stuck: {reason}"

subtask["status"] = "failed"

subtask["actual_output"] = f"Marked as stuck: {reason}"

apps/backend/core/progress.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_qa_criteria.py (1)
74-78: 🧹 Nitpick | 🔵 Trivial

Mock module doesn't explicitly set is_build_ready_for_qa — relies on MagicMock auto-attribute.

The module-level mock at line 77 explicitly sets is_build_complete but not is_build_ready_for_qa. Since qa.criteria now imports is_build_ready_for_qa from progress, it silently gets a MagicMock (truthy by default). Tests that forget to patch qa.criteria.is_build_ready_for_qa would see it return a truthy mock, potentially masking failures.

Consider explicitly setting the default:
Proposed fix
 mock_progress.is_build_complete = MagicMock(return_value=True)
+mock_progress.is_build_ready_for_qa = MagicMock(return_value=True)

🤖 Fix all issues with AI agents

In `@apps/backend/core/progress.py`:
- Around line 136-149: The duplicated logic that reads attempt_history.json and
builds stuck_subtask_ids appears in two places (the shown block and inside
get_next_subtask); extract that logic into a single helper function named
_load_stuck_subtask_ids(spec_dir) that returns a set of subtask IDs, move the
try/except, file path construction (spec_dir / "memory" /
"attempt_history.json"), JSON loading, and the comprehension into that helper,
keep the same exception handling (OSError, json.JSONDecodeError,
UnicodeDecodeError) and return an empty set on error, then replace both call
sites with stuck_subtask_ids = _load_stuck_subtask_ids(spec_dir) so both callers
use the shared implementation.

In `@apps/backend/qa/loop.py`:
- Around line 117-133: The QA readiness failure message currently prints only
completed subtasks using count_subtasks(), which mismatches
is_build_ready_for_qa()'s terminal-state logic; update the failure branch in
qa_loop to call count_subtasks_detailed(spec_dir) (or otherwise compute terminal
states) and print a progress line like "Progress: {terminal}/{total} subtasks in
terminal state" and optionally include a breakdown (completed/failed/stuck) so
the message aligns with is_build_ready_for_qa() semantics; keep
debug_warning("qa_loop", ...) and debug("qa_loop", ...) as-is but use the
detailed counts when formatting the user-facing print.

In `@apps/backend/services/recovery.py`:
- Around line 517-542: The current implementation in the block that opens
implementation_plan.json unconditionally overwrites subtask["notes"] when
marking a subtask failed (in the loop referencing self.spec_dir, subtask_id and
plan["phases"]/subtasks), which loses prior notes; change it to preserve
existing notes by prepending or appending the "Marked as stuck: {reason}" text
to the existing subtask.get("notes", "") (e.g., build a new_notes variable that
combines the new marker and the prior notes with a separator only if prior notes
exist), then set subtask["notes"] = new_notes before writing the updated plan
back to disk, leaving the rest of the error handling (logger.warning on
exceptions) intact.

apps/backend/core/progress.py

coderabbitai · 2026-02-16T08:05:37Z

apps/backend/qa/loop.py

+    # Check if there's pending human feedback that needs to be processed
+    fix_request_file = spec_dir / "QA_FIX_REQUEST.md"
+    has_human_feedback = fix_request_file.exists()
+
+    # Human feedback takes priority — if the user explicitly asked to proceed,
+    # skip the build completeness gate entirely
+    if not has_human_feedback:
+        # Verify build is ready for QA (all subtasks in terminal state)
+        if not is_build_ready_for_qa(spec_dir):
+            debug_warning(
+                "qa_loop", "Build is not ready for QA - subtasks still in progress"
+            )
+            print("\n❌ Build is not ready for QA validation.")
+            completed, total = count_subtasks(spec_dir)
+            debug("qa_loop", "Build progress", completed=completed, total=total)
+            print(f"   Progress: {completed}/{total} subtasks completed")
+            return False


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the count_subtasks function definition rg -n 'def count_subtasks' --type=py -A 15

Repository: AndyMik90/Auto-Claude

Length of output: 1873

🏁 Script executed:

#!/bin/bash # Also search for is_build_ready_for_qa to understand the terminal state logic rg -n 'def is_build_ready_for_qa' --type=py -A 20

Repository: AndyMik90/Auto-Claude

Length of output: 1568

🏁 Script executed:

#!/bin/bash # Check what files are in the qa directory find apps/backend/qa -type f -name "*.py" | head -20

Repository: AndyMik90/Auto-Claude

Length of output: 250

🏁 Script executed:

#!/bin/bash # Get full count_subtasks implementation sed -n '33,65p' apps/backend/core/progress.py

Repository: AndyMik90/Auto-Claude

Length of output: 911

🏁 Script executed:

#!/bin/bash # Get full is_build_ready_for_qa implementation sed -n '118,170p' apps/backend/core/progress.py

Repository: AndyMik90/Auto-Claude

Length of output: 1926

🏁 Script executed:

#!/bin/bash # Check count_subtasks_detailed full implementation sed -n '66,117p' apps/backend/core/progress.py

Repository: AndyMik90/Auto-Claude

Length of output: 1370

🏁 Script executed:

#!/bin/bash # Get the context around line 130 in loop.py sed -n '115,135p' apps/backend/qa/loop.py

Repository: AndyMik90/Auto-Claude

Length of output: 1087

🏁 Script executed:

#!/bin/bash # Check if count_subtasks_detailed is used anywhere in loop.py or related files rg -n 'count_subtasks_detailed' --type=py apps/backend/qa/

Repository: AndyMik90/Auto-Claude

Length of output: 47

Update the progress message to reflect terminal state semantics.

When the QA readiness gate fails (line 130), the message shows count_subtasks() output which reports only completed subtasks. However, is_build_ready_for_qa() checks for terminal states (completed, failed, or stuck). This creates a mismatch: if 3 of 5 subtasks are completed, 1 failed, and 1 in-progress, the user sees "Progress: 3/5 subtasks completed" — which doesn't explain why QA is blocked.

Either update the message to show terminal/total count (e.g., "4/5 subtasks in terminal state") to align with the gate's semantics, or use count_subtasks_detailed() to show the breakdown of non-terminal subtasks.

🤖 Prompt for AI Agents

In `@apps/backend/qa/loop.py` around lines 117 - 133, The QA readiness failure message currently prints only completed subtasks using count_subtasks(), which mismatches is_build_ready_for_qa()'s terminal-state logic; update the failure branch in qa_loop to call count_subtasks_detailed(spec_dir) (or otherwise compute terminal states) and print a progress line like "Progress: {terminal}/{total} subtasks in terminal state" and optionally include a breakdown (completed/failed/stuck) so the message aligns with is_build_ready_for_qa() semantics; keep debug_warning("qa_loop", ...) and debug("qa_loop", ...) as-is but use the detailed counts when formatting the user-facing print.

AndyMik90

🤖 Auto Claude PR Review

Merge Verdict: 🟠 NEEDS REVISION

🟠 Needs revision - 2 issue(s) require attention.

2 issue(s) must be addressed (0 required, 2 recommended)

Risk Assessment

Factor	Level	Notes
Complexity	High	Based on lines changed
Security Impact	None	Based on security findings
Scope Coherence	Good	Based on structural review

Findings Summary

Medium: 2 issue(s)

Generated by Auto Claude PR Review

Findings (2 selected of 2 total)

🟡 [2d0df14a360e] [MEDIUM] Duplicated stuck-subtask-loading logic within same file

📁 apps/backend/core/progress.py:136

The new is_build_ready_for_qa() (lines 136-149) duplicates the exact same ~15 lines of code that already exist in get_next_subtask() (lines 478-493) within the same file. Both load stuck subtask IDs from spec_dir / 'memory' / 'attempt_history.json' with identical try/except/set-comprehension logic. This should be extracted into a private helper like _load_stuck_subtask_ids(spec_dir: Path) -> set[str] to follow DRY principles and ensure any future changes to stuck-subtask detection are made in one place. | The new is_build_ready_for_qa() function (lines 136-149) contains identical stuck-subtask-ID loading logic as the existing get_next_subtask() function (lines 478-493) in the same file. Both load attempt_history.json, parse stuck_subtasks, filter by subtask_id key presence, and return a set. This is a prime candidate for a shared private helper like _load_stuck_subtask_ids(spec_dir: Path) -> set[str] to reduce duplication within this module.

Suggested fix:

Extract a private helper at the top of the file:

def _load_stuck_subtask_ids(spec_dir: Path) -> set[str]:
    """Load IDs of subtasks marked as stuck from attempt_history.json."""
    attempt_history_file = spec_dir / "memory" / "attempt_history.json"
    if not attempt_history_file.exists():
        return set()
    try:
        with open(attempt_history_file, encoding="utf-8") as f:
            attempt_history = json.load(f)
        return {
            entry["subtask_id"]
            for entry in attempt_history.get("stuck_subtasks", [])
            if "subtask_id" in entry
        }
    except (OSError, json.JSONDecodeError, UnicodeDecodeError):
        return set()

Then use it in both `is_build_ready_for_qa()` and `get_next_subtask()`.

🟡 [193ab14ede41] [MEDIUM] Plan file written with raw json.dump instead of write_json_atomic

📁 apps/backend/services/recovery.py:537

The new code writes implementation_plan.json using raw open() + json.dump(), but the established codebase pattern for writing this specific file is to use write_json_atomic from core.file_utils. This utility prevents file corruption by writing to a temp file first and atomically replacing. Searched for write_json_atomic usage — found 8+ call sites across agents/session.py, agents/tools_pkg/tools/subtask.py, agents/tools_pkg/tools/qa.py, implementation_plan/plan.py, and spec/validate_pkg/auto_fix.py — all using it for plan file writes. The recovery.py file's own internal files (attempt_history.json, build_commits.json) use raw writes which is fine since they're less critical, but implementation_plan.json is a shared coordination file read by multiple subsystems and should use the atomic write pattern.

Suggested fix:

Import and use write_json_atomic:

from core.file_utils import write_json_atomic

# Then replace the write block:
if updated:
    write_json_atomic(plan_file, plan, indent=2)

This review was generated by Auto Claude.

When a coding agent marks a subtask as "stuck", QA validation would never start because is_build_complete() requires ALL subtasks to have status "completed". This creates a deadlock: coder exits (no more subtasks to work on), but QA never triggers. Changes: - Add is_build_ready_for_qa() that considers builds ready when all subtasks reach a terminal state (completed, failed, or stuck) - Update mark_subtask_stuck() to also set status="failed" in implementation_plan.json, keeping plan file in sync with reality - Reorder QA loop to check human feedback before build completeness, so QA_FIX_REQUEST.md bypasses the build gate as intended - Replace is_build_complete() with is_build_ready_for_qa() in should_run_qa() and CLI qa commands - Add 20 new tests covering is_build_ready_for_qa() edge cases and mark_subtask_stuck() plan update behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Extract duplicated stuck-subtask loading into _load_stuck_subtask_ids() - Write stuck reason to actual_output instead of notes field for consistency with the QA reviewer's expectations - Clarify progress message to mention terminal states (completed/failed/stuck) - Update test assertions to match actual_output field Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace raw json.dump with write_json_atomic when writing implementation_plan.json in mark_subtask_stuck() to prevent file corruption, consistent with 8+ other call sites in the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

tests/test_qa_criteria.py (1)
76-78: 🧹 Nitpick | 🔵 Trivial

Stale mock: mock_progress.is_build_complete should be updated to is_build_ready_for_qa.

The progress module mock (Line 77) still explicitly sets is_build_complete but qa/criteria.py now imports is_build_ready_for_qa. This works only because MagicMock auto-creates missing attributes. Update the mock to reflect what's actually imported to avoid confusion for future readers.
Proposed fix
 mock_progress = MagicMock()
 mock_progress.count_subtasks = MagicMock(return_value=(3, 3))
-mock_progress.is_build_complete = MagicMock(return_value=True)
+mock_progress.is_build_ready_for_qa = MagicMock(return_value=True)
 sys.modules['progress'] = mock_progress
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_qa_criteria.py` around lines 76 - 78, Replace the stale mocked
attribute on the progress module: update the mock in tests/test_qa_criteria.py
so it defines mock_progress.is_build_ready_for_qa = MagicMock(return_value=True)
instead of mock_progress.is_build_complete; keep mock_progress.count_subtasks
as-is and continue assigning sys.modules['progress'] = mock_progress so
qa/criteria.py imports the real symbol name is_build_ready_for_qa used by the
code under test.
apps/backend/qa/loop.py (1)
160-242: ⚠️ Potential issue | 🟡 Minor

Human feedback flow: well-structured, but one concern on error handling.

Lines 199-227: When the fixer returns "error" with a non-transient error, the QA_FIX_REQUEST.md is deleted (line 219) and the function returns False. This means the user's feedback is permanently lost on a non-transient fixer error (e.g., a bug in the fixer prompt). Since non-transient errors can also be recoverable on retry (e.g., model overload, unexpected tool output), consider preserving the file by default and only deleting it on success — or at minimum, logging the feedback content before deletion so it can be recovered.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/qa/loop.py` around lines 160 - 242, The current logic deletes
the fix_request_file when fix_status == "error" for non-transient errors, which
can permanently lose user feedback; update the error handling around fix_status,
fix_response, and fix_error_info (from run_qa_fixer_session) so that
QA_FIX_REQUEST.md (fix_request_file) is NOT deleted on fixer errors—only delete
it after successful fixes (where debug_success("qa_loop", "Human feedback fixes
applied") runs); alternatively, if you must delete on certain permanent
failures, first persist or log the feedback content (read fix_request_file and
emit via task_event_emitter or processLogger) before unlinking to allow
recovery.
tests/agents/test_agent_flow.py (1)
925-950: 🧹 Nitpick | 🔵 Trivial

Patching is_build_ready_for_qa with side_effect=real_is_build_ready_for_qa is effectively a no-op — consider simplifying.

The stated reason is to guard against mock pollution from other test modules, but side_effect=real_is_build_ready_for_qa just delegates to the real function. If cross-module pollution is the actual concern, a simpler fix would be to ensure proper test isolation (e.g., importlib.reload) or to call is_build_ready_for_qa directly in these tests instead of going through should_run_qa.

Also, importing should_run_qa inside the with patch(...) block on every test is unnecessary — once the module is cached, the import returns the same object. The patch works regardless because it modifies the module-level attribute that should_run_qa looks up at call time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agents/test_agent_flow.py` around lines 925 - 950, The patch using
side_effect=real_is_build_ready_for_qa is redundant; remove the with patch(...)
block and either call is_build_ready_for_qa(spec_dir) directly to assert False,
or ensure a clean qa.criteria module by doing import importlib; import
qa.criteria; importlib.reload(qa.criteria) and then import should_run_qa and
assert should_run_qa(spec_dir) is False. Update the test to reference
is_build_ready_for_qa and should_run_qa (and keep save_implementation_plan
usage) so the intent is clear without the no-op patch.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/backend/qa/loop.py`:
- Line 31: The import line brings in count_subtasks but that function is not
actually used (only referenced in a misleading progress message); update
apps/backend/qa/loop.py by either removing count_subtasks from the import
(retain is_build_ready_for_qa) or actually use count_subtasks to compute and
display a correct progress value in the progress message—adjust the progress
message logic where it references count_subtasks so the import becomes
necessary, otherwise delete count_subtasks from the "from progress import
count_subtasks, is_build_ready_for_qa" statement to avoid an unused import.

In `@apps/backend/services/recovery.py`:
- Around line 519-547: The inner variable named existing in the
implementation_plan.json update block shadows an outer variable also named
existing (the list of stuck entries); rename the inner variable (e.g., to
prior_output) and update its usage in the f-string and assignment
(subtask["actual_output"] = ...) so the outer existing remains unshadowed and
the intent is clearer; locate the code around the loop that iterates
phase.get("subtasks", []) and change the inner existing → prior_output and any
references within that scope (including the conditional
f"{stuck_note}\n{existing}" if existing else stuck_note).

In `@tests/test_progress_qa_readiness.py`:
- Around line 41-418: Add a new unit test that covers the scenario where a
subtask is marked failed in the plan and also appears as stuck in
attempt_history to ensure is_build_ready_for_qa handles both signals correctly;
specifically, create an implementation_plan.json entry with a subtask having
"status": "failed" and an attempt_history.json containing a matching
stuck_subtasks entry (with "subtask_id" equal to that subtask), then assert
is_build_ready_for_qa(spec_dir) returns True and does not double-count or flip
behavior—this mirrors the output of mark_subtask_stuck and validates the
combined end-to-end path.

---

Outside diff comments:
In `@apps/backend/qa/loop.py`:
- Around line 160-242: The current logic deletes the fix_request_file when
fix_status == "error" for non-transient errors, which can permanently lose user
feedback; update the error handling around fix_status, fix_response, and
fix_error_info (from run_qa_fixer_session) so that QA_FIX_REQUEST.md
(fix_request_file) is NOT deleted on fixer errors—only delete it after
successful fixes (where debug_success("qa_loop", "Human feedback fixes applied")
runs); alternatively, if you must delete on certain permanent failures, first
persist or log the feedback content (read fix_request_file and emit via
task_event_emitter or processLogger) before unlinking to allow recovery.

In `@tests/agents/test_agent_flow.py`:
- Around line 925-950: The patch using side_effect=real_is_build_ready_for_qa is
redundant; remove the with patch(...) block and either call
is_build_ready_for_qa(spec_dir) directly to assert False, or ensure a clean
qa.criteria module by doing import importlib; import qa.criteria;
importlib.reload(qa.criteria) and then import should_run_qa and assert
should_run_qa(spec_dir) is False. Update the test to reference
is_build_ready_for_qa and should_run_qa (and keep save_implementation_plan
usage) so the intent is clear without the no-op patch.

In `@tests/test_qa_criteria.py`:
- Around line 76-78: Replace the stale mocked attribute on the progress module:
update the mock in tests/test_qa_criteria.py so it defines
mock_progress.is_build_ready_for_qa = MagicMock(return_value=True) instead of
mock_progress.is_build_complete; keep mock_progress.count_subtasks as-is and
continue assigning sys.modules['progress'] = mock_progress so qa/criteria.py
imports the real symbol name is_build_ready_for_qa used by the code under test.

---

Duplicate comments:
In `@apps/backend/qa/loop.py`:
- Around line 117-135: The progress message uses count_subtasks() which only
returns (completed, total) causing the printed "terminal state" count to be
wrong; update the logic in the qa loop (the block checking
is_build_ready_for_qa(spec_dir)) to call count_subtasks_detailed(spec_dir) (or
another helper that returns (terminal_count, total_count)) and use
terminal_count when composing the print and debug calls (replace completed with
terminal_count in the f-string and related debug("qa_loop", "Build progress",
...) call), or alternatively expose is_build_ready_for_qa's internal computation
to return terminal counts so the progress line accurately shows
terminal_count/total.

coderabbitai · 2026-02-17T14:38:45Z

apps/backend/qa/loop.py

 )
 from phase_event import ExecutionPhase, emit_phase
-from progress import count_subtasks, is_build_complete
+from progress import count_subtasks, is_build_ready_for_qa


🧹 Nitpick | 🔵 Trivial

Import of count_subtasks is retained but only used in the misleading progress message.

count_subtasks is still imported on line 31 alongside is_build_ready_for_qa. Once the progress message is fixed (see above), this import may become unused. Keep this in mind when addressing the progress message.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/backend/qa/loop.py` at line 31, The import line brings in count_subtasks but that function is not actually used (only referenced in a misleading progress message); update apps/backend/qa/loop.py by either removing count_subtasks from the import (retain is_build_ready_for_qa) or actually use count_subtasks to compute and display a correct progress value in the progress message—adjust the progress message logic where it references count_subtasks so the import becomes necessary, otherwise delete count_subtasks from the "from progress import count_subtasks, is_build_ready_for_qa" statement to avoid an unused import.

coderabbitai · 2026-02-17T14:38:45Z

apps/backend/services/recovery.py

+        # Also update the subtask status in implementation_plan.json
+        # so that other callers (like is_build_ready_for_qa) see accurate status
+        try:
+            plan_file = self.spec_dir / "implementation_plan.json"
+            if plan_file.exists():
+                with open(plan_file, encoding="utf-8") as f:
+                    plan = json.load(f)
+
+                updated = False
+                for phase in plan.get("phases", []):
+                    for subtask in phase.get("subtasks", []):
+                        if subtask.get("id") == subtask_id:
+                            subtask["status"] = "failed"
+                            stuck_note = f"Marked as stuck: {reason}"
+                            existing = subtask.get("actual_output", "")
+                            subtask["actual_output"] = (
+                                f"{stuck_note}\n{existing}" if existing else stuck_note
+                            )
+                            updated = True
+                            break
+                    if updated:
+                        break
+
+                if updated:
+                    write_json_atomic(plan_file, plan, indent=2)
+        except (OSError, json.JSONDecodeError, UnicodeDecodeError) as e:
+            logger.warning(
+                f"Failed to update implementation_plan.json for stuck subtask {subtask_id}: {e}"
+            )


🧹 Nitpick | 🔵 Trivial

Variable existing shadows outer scope on Line 533.

Line 507 defines existing (list of stuck entries matching subtask_id), and Line 533 redefines existing (the actual_output string). While not a bug — the outer existing is no longer needed — it hurts readability. Consider renaming the inner one, e.g. prior_output.

Proposed rename

- existing = subtask.get("actual_output", "") + prior_output = subtask.get("actual_output", "") subtask["actual_output"] = ( - f"{stuck_note}\n{existing}" if existing else stuck_note + f"{stuck_note}\n{prior_output}" if prior_output else stuck_note )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Also update the subtask status in implementation_plan.json

# so that other callers (like is_build_ready_for_qa) see accurate status

try:

plan_file = self.spec_dir / "implementation_plan.json"

if plan_file.exists():

with open(plan_file, encoding="utf-8") as f:

plan = json.load(f)

updated = False

for phase in plan.get("phases", []):

for subtask in phase.get("subtasks", []):

if subtask.get("id") == subtask_id:

subtask["status"] = "failed"

stuck_note = f"Marked as stuck: {reason}"

existing = subtask.get("actual_output", "")

subtask["actual_output"] = (

f"{stuck_note}\n{existing}" if existing else stuck_note

)

updated = True

break

if updated:

break

if updated:

write_json_atomic(plan_file, plan, indent=2)

except (OSError, json.JSONDecodeError, UnicodeDecodeError) as e:

logger.warning(

f"Failed to update implementation_plan.json for stuck subtask {subtask_id}: {e}"

)

# Also update the subtask status in implementation_plan.json

# so that other callers (like is_build_ready_for_qa) see accurate status

try:

plan_file = self.spec_dir / "implementation_plan.json"

if plan_file.exists():

with open(plan_file, encoding="utf-8") as f:

plan = json.load(f)

updated = False

for phase in plan.get("phases", []):

for subtask in phase.get("subtasks", []):

if subtask.get("id") == subtask_id:

subtask["status"] = "failed"

stuck_note = f"Marked as stuck: {reason}"

prior_output = subtask.get("actual_output", "")

subtask["actual_output"] = (

f"{stuck_note}\n{prior_output}" if prior_output else stuck_note

)

updated = True

break

if updated:

break

if updated:

write_json_atomic(plan_file, plan, indent=2)

except (OSError, json.JSONDecodeError, UnicodeDecodeError) as e:

logger.warning(

f"Failed to update implementation_plan.json for stuck subtask {subtask_id}: {e}"

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/backend/services/recovery.py` around lines 519 - 547, The inner variable named existing in the implementation_plan.json update block shadows an outer variable also named existing (the list of stuck entries); rename the inner variable (e.g., to prior_output) and update its usage in the f-string and assignment (subtask["actual_output"] = ...) so the outer existing remains unshadowed and the intent is clearer; locate the code around the loop that iterates phase.get("subtasks", []) and change the inner existing → prior_output and any references within that scope (including the conditional f"{stuck_note}\n{existing}" if existing else stuck_note).

coderabbitai · 2026-02-17T14:38:45Z

tests/test_progress_qa_readiness.py

+class TestIsBuildReadyForQA:
+    """Tests for is_build_ready_for_qa function."""
+
+    def test_all_subtasks_completed(self, spec_dir: Path):
+        """Returns True when all subtasks are completed."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2", "status": "completed"},
+                    ],
+                },
+                {
+                    "phase": 2,
+                    "name": "Phase 2",
+                    "subtasks": [
+                        {"id": "subtask-2-1", "status": "completed"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True
+
+    def test_mix_completed_and_pending(self, spec_dir: Path):
+        """Returns False when some subtasks are still pending."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2", "status": "pending"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_mix_completed_and_failed(self, spec_dir: Path):
+        """Returns True when all subtasks are terminal (completed + failed)."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2", "status": "failed"},
+                    ],
+                },
+                {
+                    "phase": 2,
+                    "name": "Phase 2",
+                    "subtasks": [
+                        {"id": "subtask-2-1", "status": "completed"},
+                        {"id": "subtask-2-2", "status": "failed"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True
+
+    def test_subtask_stuck_in_attempt_history(self, spec_dir: Path, memory_dir: Path):
+        """Returns True when subtask is marked stuck in attempt_history even if plan shows pending."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2", "status": "pending"},  # Stuck but plan not updated
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        # Create attempt_history with stuck subtask
+        attempt_history = {
+            "stuck_subtasks": [
+                {
+                    "subtask_id": "subtask-1-2",
+                    "reason": "Circular fix after 3 attempts",
+                    "escalated_at": "2024-01-01T12:00:00Z",
+                    "attempt_count": 3,
+                }
+            ],
+            "subtasks": {},
+        }
+        history_file = memory_dir / "attempt_history.json"
+        history_file.write_text(json.dumps(attempt_history))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True
+
+    def test_no_plan_file(self, spec_dir: Path):
+        """Returns False when implementation_plan.json doesn't exist."""
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_empty_phases(self, spec_dir: Path):
+        """Returns False when plan has no subtasks (total=0)."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_phases_with_no_subtasks(self, spec_dir: Path):
+        """Returns False when phases exist but contain no subtasks."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_no_attempt_history_file(self, spec_dir: Path):
+        """Returns True based on plan file alone when attempt_history.json doesn't exist."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2", "status": "failed"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        # No attempt_history.json created
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True
+
+    def test_invalid_json_in_attempt_history(self, spec_dir: Path, memory_dir: Path):
+        """Gracefully handles invalid JSON in attempt_history and falls back to plan-only check."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        # Create invalid JSON in attempt_history
+        history_file = memory_dir / "attempt_history.json"
+        history_file.write_text("{ invalid json }")
+
+        # Should fallback to plan-only check and return True
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True
+
+    def test_invalid_json_in_plan(self, spec_dir: Path):
+        """Returns False when implementation_plan.json contains invalid JSON."""
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text("{ invalid json }")
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_empty_plan_file(self, spec_dir: Path):
+        """Returns False when implementation_plan.json is empty."""
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text("")
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_multiple_stuck_subtasks(self, spec_dir: Path, memory_dir: Path):
+        """Returns True when multiple subtasks are stuck in attempt_history."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "pending"},
+                        {"id": "subtask-1-2", "status": "pending"},
+                        {"id": "subtask-1-3", "status": "completed"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        # Mark two subtasks as stuck
+        attempt_history = {
+            "stuck_subtasks": [
+                {"subtask_id": "subtask-1-1", "reason": "Error 1"},
+                {"subtask_id": "subtask-1-2", "reason": "Error 2"},
+            ],
+            "subtasks": {},
+        }
+        history_file = memory_dir / "attempt_history.json"
+        history_file.write_text(json.dumps(attempt_history))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True
+
+    def test_mix_of_all_terminal_states(self, spec_dir: Path, memory_dir: Path):
+        """Returns True with completed, failed, and stuck subtasks."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2", "status": "failed"},
+                        {"id": "subtask-1-3", "status": "pending"},  # Will be stuck
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        attempt_history = {
+            "stuck_subtasks": [
+                {"subtask_id": "subtask-1-3", "reason": "Stuck"},
+            ],
+            "subtasks": {},
+        }
+        history_file = memory_dir / "attempt_history.json"
+        history_file.write_text(json.dumps(attempt_history))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True
+
+    def test_in_progress_status(self, spec_dir: Path):
+        """Returns False when subtasks are in_progress."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2", "status": "in_progress"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_missing_status_field(self, spec_dir: Path):
+        """Returns False when subtask has no status field (defaults to pending)."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed"},
+                        {"id": "subtask-1-2"},  # No status field
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_stuck_subtask_without_id_field(self, spec_dir: Path, memory_dir: Path):
+        """Ignores stuck subtasks without subtask_id field in attempt_history."""
+        plan = {
+            "feature": "Test Feature",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "pending"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan))
+
+        # Malformed stuck subtask entry without subtask_id
+        attempt_history = {
+            "stuck_subtasks": [
+                {"reason": "Error", "escalated_at": "2024-01-01T12:00:00Z"}
+            ],
+            "subtasks": {},
+        }
+        history_file = memory_dir / "attempt_history.json"
+        history_file.write_text(json.dumps(attempt_history))
+
+        # Should return False since subtask-1-1 is still pending
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is False
+
+    def test_unicode_encoding_in_files(self, spec_dir: Path, memory_dir: Path):
+        """Handles UTF-8 encoded content correctly."""
+        plan = {
+            "feature": "Test Feature 测试功能",
+            "phases": [
+                {
+                    "phase": 1,
+                    "name": "Phase 1",
+                    "subtasks": [
+                        {"id": "subtask-1-1", "status": "completed", "notes": "完成"},
+                    ],
+                },
+            ],
+        }
+        plan_file = spec_dir / "implementation_plan.json"
+        plan_file.write_text(json.dumps(plan, ensure_ascii=False), encoding="utf-8")
+
+        attempt_history = {
+            "stuck_subtasks": [],
+            "subtasks": {},
+        }
+        history_file = memory_dir / "attempt_history.json"
+        history_file.write_text(json.dumps(attempt_history, ensure_ascii=False), encoding="utf-8")
+
+        result = is_build_ready_for_qa(spec_dir)
+        assert result is True


🧹 Nitpick | 🔵 Trivial

Good test coverage — consider adding a test for stuck subtasks with status="failed" in the plan.

The PR description states that mark_subtask_stuck() now sets status="failed" in implementation_plan.json. This means in practice, stuck subtasks will have status="failed" in the plan and appear in attempt_history.json. Consider adding a test that verifies this combined state (plan has status="failed" + history has stuck_subtasks entry) to validate the end-to-end path after mark_subtask_stuck() runs.

This would also confirm that is_build_ready_for_qa doesn't double-count or behave unexpectedly when both signals are present.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_progress_qa_readiness.py` around lines 41 - 418, Add a new unit test that covers the scenario where a subtask is marked failed in the plan and also appears as stuck in attempt_history to ensure is_build_ready_for_qa handles both signals correctly; specifically, create an implementation_plan.json entry with a subtask having "status": "failed" and an attempt_history.json containing a matching stuck_subtasks entry (with "subtask_id" equal to that subtask), then assert is_build_ready_for_qa(spec_dir) returns True and does not double-count or flip behavior—this mirrors the output of mark_subtask_stuck and validates the combined end-to-end path.

github-actions bot added area/backend This is backend only bug Something isn't working size/L Large (500-999 lines) labels Feb 16, 2026

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

github-advanced-security bot found potential problems Feb 16, 2026

View reviewed changes

apps/backend/core/progress.py Fixed Show fixed Hide fixed

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

AndyMik90 self-assigned this Feb 16, 2026

AndyMik90 commented Feb 16, 2026

View reviewed changes

AndyMik90 and others added 3 commits February 17, 2026 15:33

AndyMik90 force-pushed the fix/qa-validation-stuck-subtasks branch from 93a5d4e to 3bf5a1f Compare February 17, 2026 14:35

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

nsalvacao mentioned this pull request Feb 17, 2026

intake(lote-B): import upstream PR #1853 nsalvacao/Auto-Claude#9

Open

4 tasks

		subtask["status"] = "failed"
		subtask["notes"] = f"Marked as stuck: {reason}"

Uh oh!

Conversation

AndyMik90 commented Feb 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Test plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

AndyMik90 left a comment

Choose a reason for hiding this comment

🤖 Auto Claude PR Review

Merge Verdict: 🟠 NEEDS REVISION

Risk Assessment

Findings Summary

Findings (2 selected of 2 total)

🟡 [2d0df14a360e] [MEDIUM] Duplicated stuck-subtask-loading logic within same file

🟡 [193ab14ede41] [MEDIUM] Plan file written with raw json.dump instead of write_json_atomic

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AndyMik90 commented Feb 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 16, 2026 •

edited

Loading